-
Notifications
You must be signed in to change notification settings - Fork 10
Feature: Basic Agentic Pick and Place - reimplemented manipulation skills #1237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
…lues have tiny errors that can get out of joint limit range
Greptile OverviewGreptile SummaryThis PR adds an agent-invokable skill layer to manipulation by converting The changes fit into the existing architecture by leaning on Confidence Score: 2/5
Important Files Changed
Sequence DiagramsequenceDiagram
autonumber
participant User as Human user
participant HI as human_input (CLI)
participant Agent as llm_agent
participant Skills as SkillCoordinator
participant Manip as ManipulationModule (SkillModule)
participant WM as WorldMonitor
participant Planner as RRTConnectPlanner
participant CC as ControlCoordinator
participant Task as Trajectory Task (JointTrajectoryController)
User->>HI: type "pick up the cup"
HI->>Agent: user message
Agent->>Skills: select skill + args
Skills->>Manip: pick(object_name)
Manip->>WM: refresh_obstacles(min_duration)
WM-->>Manip: obstacles added
Manip->>Manip: _generate_grasps_for_pick()
alt GraspingModule RPC wired
Manip->>Manip: get_rpc_calls("GraspingModule.generate_grasps")
Manip-->>Manip: grasp candidates (intended)
else fallback heuristic
Manip->>WM: list_cached_detections()
WM-->>Manip: detections snapshot
Manip-->>Manip: heuristic grasp pose
end
Manip->>Planner: plan_joint_path(start, goal)
Planner-->>Manip: JointPath
Manip->>Manip: generate JointTrajectory
Manip->>CC: task_invoke(task, "execute", {trajectory})
CC->>Task: execute(trajectory)
Task-->>CC: accepted
CC-->>Manip: result
loop wait for completion
Manip->>CC: task_invoke(task, "get_status" or "get_state")
CC->>Task: get_status()/get_state()
Task-->>CC: TrajectoryStatus / None
CC-->>Manip: status
end
Manip-->>Skills: streamed progress strings
Skills-->>Agent: tool results
Agent-->>HI: assistant response
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
9 files reviewed, 3 comments
| try: | ||
| generate = self.get_rpc_calls("GraspingModule.generate_grasps") | ||
| result_str = generate(object_name, object_id, True) | ||
| # If generate_grasps returned actual PoseArray via the grasps port, | ||
| # we need to get the poses. For now, check if it returned an error string. | ||
| if isinstance(result_str, str) and "No" in result_str: | ||
| logger.info(f"GraspGen returned: {result_str}, falling back to heuristic") | ||
| else: | ||
| # GraspGen succeeded — get poses from the grasps port or RPC | ||
| logger.info(f"GraspGen result: {result_str}") | ||
| # Try to get the grasp poses via RPC | ||
| try: | ||
| get_grasps = self.get_rpc_calls("GraspingModule.get_latest_grasps") | ||
| grasp_poses: PoseArray | None = get_grasps() | ||
| if grasp_poses and len(grasp_poses.poses) > 0: | ||
| return list(grasp_poses.poses) | ||
| except Exception: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Invalid Grasping RPC usage
_generate_grasps_for_pick() calls self.get_rpc_calls("GraspingModule.get_latest_grasps"), but rpc_calls only declares GraspingModule.generate_grasps (manipulation_module.py:120-123) and GraspingModule doesn’t implement any get_latest_grasps RPC (it publishes to an Out[PoseArray] port instead; see dimos/manipulation/grasping/grasping.py:37-106). This will raise ValueError at runtime on the “GraspGen succeeded” path.
Additional Comments (1)
This switched to |
|
i assume i need hardware to test this |
| - Tier 2 (@skill): Individually testable actions (move_to_pose, open_gripper, etc.) | ||
| - Tier 3 (@skill): Composed behaviors (pick, place, pick_and_place) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Semantially not sure the difference between how these are explained. both can be called by agents. Maybe better to clarify one long horizon and one short horizon "functions" given that there will rarely be failure during an open gripper action, but more potential failure for a full pick and place
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Short - Long Horizon is a better classification.
The distinction here was just for readability and code organization.
| objects: In[list[DetObject]] | ||
|
|
||
| # RPC calls for GraspGen integration (resolved at runtime if modules are deployed) | ||
| rpc_calls: list[str] = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be deprecated with @jeff-hykin 's RPC stuff but will let it pass for now & in interest of release timeline
| self._init_joints_captured = False | ||
|
|
||
| # Last pick position: stored during pick so place_back() can return the object | ||
| self._last_pick_position: tuple[float, float, float] | None = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason these aren't typed. Makes development harder without guaranteed types / typechecking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coordinator client interfaces with hardware much loser. I will add proper type checking to the ControlCoordinator API
| # Snapshotted detections from the last scan_objects/refresh call. | ||
| # The live detection cache is volatile (labels change every frame), | ||
| # so pick/place use this stable snapshot instead. | ||
| self._detection_snapshot: list[dict[str, object]] = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here also idk what object means
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should have been an Object. A confusion with object vs Object. will fix.
| def _make_pose( | ||
| self, x: float, y: float, z: float, roll: float, pitch: float, yaw: float | ||
| ) -> Pose: | ||
| """Construct a Pose from position and Euler angles (radians).""" | ||
| from dimos.msgs.geometry_msgs import Pose, Vector3 | ||
| from dimos.utils.transform_utils import euler_to_quaternion | ||
|
|
||
| orientation = euler_to_quaternion(Vector3(roll, pitch, yaw)) | ||
| return Pose(position=Vector3(x, y, z), orientation=orientation) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we doing this
| def _wait_for_trajectory_completion( | ||
| self, robot_name: RobotName | None = None, timeout: float = 60.0, poll_interval: float = 0.2 | ||
| ) -> bool: | ||
| """Wait for trajectory execution to complete. | ||
|
|
||
| Polls the coordinator task state via task_invoke. Falls back to waiting | ||
| for the trajectory duration if the coordinator is unavailable. | ||
|
|
||
| Args: | ||
| robot_name: Robot to monitor | ||
| timeout: Maximum wait time in seconds | ||
| poll_interval: Time between status checks | ||
|
|
||
| Returns: | ||
| True if trajectory completed successfully | ||
| """ | ||
| robot = self._get_robot(robot_name) | ||
| if robot is None: | ||
| return True | ||
| rname, _, config, _ = robot | ||
| client = self._get_coordinator_client() | ||
|
|
||
| if client is None or not config.coordinator_task_name: | ||
| # No coordinator — wait for trajectory duration as fallback | ||
| traj = self._planned_trajectories.get(rname) | ||
| if traj is not None: | ||
| logger.info(f"No coordinator status — waiting {traj.duration:.1f}s for trajectory") | ||
| time.sleep(traj.duration + 0.5) | ||
| return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be handled by @paul-nechifor skill coordinator stuff. We should have a universal and Generic set of "Skills" or "Processes" within dimOS and all governed by the same coordinator and manager. Fine for now since not done yet.
- dimOS skills are functions that agents can call
- dimOS processes are short-mid-long running tasks typically triggered by skill calls (or directly)
| def _find_object_in_detections( | ||
| self, object_name: str, object_id: str | None = None | ||
| ) -> dict[str, object] | None: | ||
| """Find an object in the detection snapshot by name or ID. | ||
|
|
||
| Uses the snapshot taken during the last scan_objects/refresh call, | ||
| not the volatile live cache (which changes labels every frame). | ||
|
|
||
| Args: | ||
| object_name: Name/label to search for | ||
| object_id: Optional specific object ID | ||
|
|
||
| Returns: | ||
| Detection dict with position info, or None | ||
| """ | ||
| detections = self._detection_snapshot | ||
| if not detections: | ||
| logger.warning("No detection snapshot — call scan_objects() first") | ||
| return None | ||
|
|
||
| for det in detections: | ||
| if object_id and str(det.get("object_id", "")) == str(object_id): | ||
| return dict(det) | ||
| name = str(det.get("name", "")).lower() | ||
| if object_name.lower() in name or name in object_name.lower(): | ||
| return dict(det) | ||
|
|
||
| available = [str(d.get("name", "?")) for d in detections] | ||
| logger.warning(f"Object '{object_name}' not found in snapshot. Available: {available}") | ||
| return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty horrible and caused by the lack of type i flagged above on this detections dict list thing. We have a detection type that you would be able to just do this in one line. No need for dict because 2DDetection has a field string so you can for any detection check detection.string
detections_list = List[2DDetection]
for det in detections_list:
...
And even better we just define a type 2DDetectionList of 2DDetectionDB which has search and __iter__ methods.
Only reason this may be acceptable to PR is that the above is undergoing a refactor from @leshy so idk the current state.
| def go_home(self, robot_name: str | None = None) -> Generator[str, None, None]: | ||
| """Move the robot to its home/observe joint configuration. | ||
|
|
||
| Opens the gripper and moves to the predefined home position. | ||
|
|
||
| Args: | ||
| robot_name: Robot to move (only needed for multi-arm setups). | ||
| """ | ||
| robot = self._get_robot(robot_name) | ||
| if robot is None: | ||
| yield "Error: Robot not found" | ||
| return | ||
| _, _, config, _ = robot | ||
|
|
||
| if config.home_joints is None: | ||
| yield "Error: No home_joints configured for this robot" | ||
| return | ||
|
|
||
| yield "Opening gripper..." | ||
| self._set_gripper_position(0.85, robot_name) | ||
| time.sleep(0.5) | ||
|
|
||
| yield "Planning motion to home position..." | ||
| if not self.plan_to_joints(config.home_joints, robot_name): | ||
| yield "Error: Failed to plan path to home position" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah i assume @paul-nechifor 's stuff will solve for this generator / yielding requirement right now so fine for now
| def pick( | ||
| self, | ||
| object_name: str, | ||
| object_id: str | None = None, | ||
| robot_name: str | None = None, | ||
| ) -> Generator[str, None, None]: | ||
| """Pick up an object by name using grasp planning and motion execution. | ||
|
|
||
| Scans the scene, generates grasp poses (via GraspGen or heuristic fallback), | ||
| plans collision-free approach/grasp/retract motions, and executes them. | ||
|
|
||
| Args: | ||
| object_name: Name of the object to pick (e.g. "cup", "bottle", "can"). | ||
| object_id: Optional unique object ID from perception for precise identification. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@paul-nechifor Flagging for you since skills like this are helpful to see what we're solving for on agents
You only need a realsense camera to test this PR. All the robot hardware can be simulated using the coordinator-mock blueprint |
Problem
ManipulationModule had no skill layer — only low-level RPCs requiring a custom IPython client.
No way for an LLM agent to invoke pick/place/move skills through the framework.
RRT planner rejected valid joint states due to floating-point drift past URDF limits.
Solution
Added 11
@skill()methods to ManipulationModule (pick, place, move_to_pose, scan_objects, etc.).Changed base class to
SkillModuleso skills auto-register with agents viaautoconnect().Created
xarm-perception-agentblueprint composing xarm_perception + llm_agent + human_input.Added detection snapshot so pick uses stable labels instead of volatile live cache.
Added
limit_eps=1e-3tolerance to RRT planner joint limit validation.Removed ManipulationClient and
run_*RPC wrappers — agent CLI replaces them.CoordinatorClient updated to route execution through
task_invokeinstead of removed RPCs.Breaking Changes
ManipulationClientdeleted — usedimos run xarm-perception-agentor direct RPC.run_pick/run_place/run_place_back/run_go_initRPCs removed from ManipulationModule.How to Test
dimos run coordinator-mockthendimos run xarm-perception-agentscan the scene— verify objects listed with 3D positionspick up the <object>— verify approach, grasp, retract sequenceplace it back— verify placement at original pick positiondimos agentspycloses DIM-351
closes DIM-419